Risk Factors Associated with Low Infant Birthweight

Group 013E03

James Finlay, Gisella Sutanto, Xiaoying Wei, Angus Vrantsis, Sophia Yuen

What are we trying to investigate?

Our Problem

What causes an infant to have a lower weight at birth?

  • Low infant birthweight is linked with higher mortality rates and long-term health issues
  • Identify risks so we can come up preventions
  • Help mothers give birth to healthier babies
  • Happy Baby, Happy Life

Where did we get our data from?

Information

  • Collected at Baystate Medical Center, Springfield, Mass in 1986.
  • 189 samples
Variable Name Explanation
bwt Birth weight in grams.
low Indicator if birth weight is less than 2.5 kg.
age Mother's age in years.
lwt Mother's weight in pounds at her last menstrual period.
race Mother's race (1 = white, 2 = black, 3 = other).
smoke Did the mother smoke during pregnancy?
ptl Number of previous premature labours.
ht Does the mother have a history of hypertension?
ui Was there presence of uterine irritability?
ftv Number of physician visits during the first trimester.

Our Model

Final Model

\[ \begin{aligned} \operatorname{\widehat{bwt}} &= 2838.83 + 90.8(\operatorname{over30}_{\operatorname{yes}}) - 540.69(\operatorname{ui}_{\operatorname{yes}})\ - \\ &\quad 435.37(\operatorname{race}_{\operatorname{black}}) - 301.12(\operatorname{race}_{\operatorname{other}}) - 327.26(\operatorname{smoke}_{\operatorname{yes}})\ - \\ &\quad 567.48(\operatorname{ht}_{\operatorname{1}}) + 4.15(\operatorname{lwt}) - 303.48(\operatorname{ptl}_{\operatorname{1}})\ - \\ &\quad 9.05(\operatorname{ptl}_{\operatorname{2}}) + 1271.6(\operatorname{ptl}_{\operatorname{3}}) \end{aligned} \]

Variable Name Explanation
bwt Birth weight in grams.
over30 Is the mother over 30?
race Was there presence of uterine irritability?
smoke Did the mother smoke during pregnancy?
ht Does the mother have a history of hypertension?
lwt Mother's weight in pounds at her last menstrual period.
ptl Number of previous premature labours.

Checking Initial Assumptions: Normality and Linearity

Model Selection

How did we decide on our model?

  • The stepwise method Backward and Forward Selection

Information

Using \(AIC = nlog(\frac{RSS}{n}) + 2p\) : The smaller the AIC the better, the model

Model Selection

Results from Stepwise Procedure

  Forward  Backward 
Predictors Estimates p Estimates p
Intercept 2834.32 <0.001 2834.32 <0.001
Presence of uterine irritability: Yes -542.51 <0.001 -542.51 <0.001
Race of Mother: Black -445.61 0.002 -445.61 0.002
Race of Mother: White -310.80 0.006 -310.80 0.006
Does Mother Smoke: Yes -330.18 0.002 -330.18 0.002
History of Hypotension: Yes -573.97 0.004 -573.97 0.004
Weight of Mother 4.31 0.010 4.31 0.010
Number of Previous Premature Labour: 1 -294.43 0.042 -294.43 0.042
Number of Previous Premature Labour: 2 -15.49 0.958 -15.49 0.958
Number of Previous Premature Labour: 3 1266.34 0.055 1266.34 0.055
Observations 189 189
R2 / R2 adjusted 0.275 / 0.238 0.275 / 0.238
AIC 2988.407 2988.407

Model Selection

Limitations of Stepwise Method

  • Stepwise method is efficient, provides us a “statistical” criteria to pick variables
  • However not a perfect method, relies on computer to pick without any considerations for what variables actually mean
  • Best to do further research

From biostatistician, Ronan Conroy

“Personally, I would no more let an automatic routine select my model than I would let some best fit procedure pack my suitcase.”

Model Selection

Adding Age into our Model

  • Mothers over 30 more likely to give birth to an underweight infant
  • Created categorical variable named over30. Mothers over 30 are set to True and mothers over are set to False (Bergen et al., 2022)
  • Performed regression again with new variable

Model with Age

\[ \begin{aligned} \operatorname{\widehat{bwt}} &= 2838.83 + 90.8(\operatorname{over30}_{\operatorname{yes}}) - 540.69(\operatorname{ui}_{\operatorname{yes}})\ - \\ &\quad 435.37(\operatorname{race}_{\operatorname{black}}) - 301.12(\operatorname{race}_{\operatorname{other}}) - 327.26(\operatorname{smoke}_{\operatorname{yes}})\ - \\ &\quad 567.48(\operatorname{ht}_{\operatorname{1}}) + 4.15(\operatorname{lwt}) - 303.48(\operatorname{ptl}_{\operatorname{1}})\ - \\ &\quad 9.05(\operatorname{ptl}_{\operatorname{2}}) + 1271.6(\operatorname{ptl}_{\operatorname{3}}) \end{aligned} \]

Checking Assumptions

Linearity

  • Assumed from data collection design

Homoskedascity

  • Can be seen from the following figure

Checking Assumptions

Independence

  • Assumed from data collection design

Normality

  • Can be seen from the following figure

Model Interpretation

Final Model

\[ \begin{aligned} \operatorname{\widehat{bwt}} &= 2838.83 + 90.8(\operatorname{over30}_{\operatorname{yes}}) - 540.69(\operatorname{ui}_{\operatorname{yes}})\ - \\ &\quad 435.37(\operatorname{race}_{\operatorname{black}}) - 301.12(\operatorname{race}_{\operatorname{other}}) - 327.26(\operatorname{smoke}_{\operatorname{yes}})\ - \\ &\quad 567.48(\operatorname{ht}_{\operatorname{1}}) + 4.15(\operatorname{lwt}) - 303.48(\operatorname{ptl}_{\operatorname{1}})\ - \\ &\quad 9.05(\operatorname{ptl}_{\operatorname{2}}) + 1271.6(\operatorname{ptl}_{\operatorname{3}}) \end{aligned} \]

  • Based on our model, holding everything else constant:
    • If mother has history of uterine irritability, birth weight will decrease
    • If mother smokes, birth weight will decrease
    • If mother has had history of hypertension, birth weight will decrease

Model Interpretation

Final Model

\[ \begin{aligned} \operatorname{\widehat{bwt}} &= 2838.83 + 90.8(\operatorname{over30}_{\operatorname{yes}}) - 540.69(\operatorname{ui}_{\operatorname{yes}})\ - \\ &\quad 435.37(\operatorname{race}_{\operatorname{black}}) - 301.12(\operatorname{race}_{\operatorname{other}}) - 327.26(\operatorname{smoke}_{\operatorname{yes}})\ - \\ &\quad 567.48(\operatorname{ht}_{\operatorname{1}}) + 4.15(\operatorname{lwt}) - 303.48(\operatorname{ptl}_{\operatorname{1}})\ - \\ &\quad 9.05(\operatorname{ptl}_{\operatorname{2}}) + 1271.6(\operatorname{ptl}_{\operatorname{3}}) \end{aligned} \]

  • Based on our model, holding everything else constant:
    • If mother is over 30, birth weight will increase
    • The more premature labours the mother has, birth weight will increase
    • If mother weight at her last menstrual period increases, birth weight will increase

Model Interpretation

Implications of our Model

  • Gives us a basic understanding of factors that can affect an infant’s birthweight
  • Help devise prevention strategies
  • Opens up various research ideas e.g. why do non-white infants tend to be lower in weight? Is there a bias towards white infants in the healthcare system?
  • However, more research needs to be done

Assessing Performance of Our Model

In Sample Performance: model \(R^2\)

[1] 0.2761181


Out of Sample Performance: Cross Validation Method

  • Separated dataset into 10 groups
  • Created our model using 9 of these groups
  • Test model on remaining group

Assessing Performance of Our Model

Results from Cross Validation

RMSE Rsquared MAE
655.3227 0.2189425 533.4679

Limitations of Our Model

Issues with our data

  • The data was sourced from a single hospital and was gathered in 1986, so it might be outdated.
  • With a limited sample size of just 189, the data might be vulnerable to inconsistencies.
  • Generalising these findings to a broader population may be unwise.

Limitations of Our Model

Is Linear Regression Accurate?

  • Linear regression is useful since it is easy to understand and implement
  • However can be an oversimplification of complex data e.g. is it actually possible to predict an infant’s birth weight?
  • So can be unrealistic

Conclusion

  • Performed multiple linear regression with specific predictors, selected through Stepwise Method
  • The multiple regression is created to model of risk factors associated with low infant birth weight
  • \(R^2\) is 0.27, indicating that the model is limited, thus further research is needed

References

Bergen, N. E., Jaddoe, V. W. V., Timmermans, S., Hofman, A., Lindemans, J., Russcher, H., Raat, H., Steegers-Theunissen, R. P. M., & Steegers, E. A. P. (2020). Homocysteine and folate concentrations in early pregnancy and the risk of adverse pregnancy outcomes: The Generation R Study. BJOG: An International Journal of Obstetrics & Gynaecology, 117(6), 731-737.https://doi.org/10.1111/j.1471-0528.2010.02513.

Tarr, G. (2023). Sydney Quarto Presentation Repository. GitHub. https://github.com/garthtarr/sydney_quarto

https://www.statisticssolutions.com/stepwise-regression-what-is-it-and-should-you-use-it/